Open-source DeepResearch – Freeing our search agents
https://huggingface.co/blog/open-deep-research
Introducing deep research (OpenAI)を受けて24時間チャレンジ
ベンチマーク General AI Assistants benchmark (GAIA)
https://github.com/huggingface/smolagents/tree/c41a50a0dbdebbdbb3e5a939c790184021b2f870/examples/open_deep_research (ref: Results 🏅)
テキストブラウザで実装したとのこと(ref: Making the right tools 🛠️)
関係ある? https://github.com/huggingface/smolagents/pull/317
What are Agent frameworks and why they matter?
What's next for AI agentic workflows ft. Andrew Ng of AI Fund
Introducing smolagents, a simple library to build agents
(TODO 指標について飛ばした)
Building an open Deep Research
Using a CodeAgent
Executable Code Actions Elicit Better LLM Agents
アクションをコードで表現するエージェント
aymeric-roucher/agent_reasoning_benchmark
Making the right tools 🛠️
1. A web browser
we started with an extremely simple text-based web browser for now for our first proof-of-concept
SimpleTextBrowser smolagents/examples/open_deep_research/scripts/text_web_browser.py
2. A simple text inspector
https://github.com/huggingface/smolagents/blob/gaia-submission-r1/examples/open_deep_research/scripts/text_inspector_tool.py
多様な拡張子に対応したファイルを読み込むツールぽい
These tools were taken from the excellent Magentic-One agent by Microsoft Research,
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks
Results 🏅
We’ve quickly gone up from the previous SoTA with an open framework, around 46% for Magentic-One, to our current performance of 55.15% on the validation set.
CodeAgentによると考察している
when switching to a standard agent that writes actions in JSON instead of code, performance of the same setup is instantly degraded to 33% average on the validation set.